Conceptual Clustering with Numeric-and-Nominal Mixed Data | A New Similarity Based System
نویسنده
چکیده
This paper presents a new Similarity Based Agglomerative Clustering(SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy[13], that gives greater weight to uncommon feature-value matches in similarity computations and makes no assumptions of the underlying distributions of the feature-values, is adopted to de ne the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a concept tree, and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on real and arti cially generated data sets. Results demonstrate the e ectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other schemes illustrate the superior performance of the algorithm.
منابع مشابه
Unsupervised Learning with Mixed Numeric and Nominal Data
ÐThis paper presents a Similarity-Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy [15], that gives greater weight to uncommon feature value matches in similarity computations and makes no assumptions of the underlying distributions of the feature values, is adopted...
متن کاملClustering Mixed Data via Diffusion Maps
Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. Although many clustering algorithms have been proposed most of them deal with clustering of numerical data. Finding the similarity between numeric objects usually relies on a com...
متن کاملخوشهبندی خودکار دادههای مختلط با استفاده از الگوریتم ژنتیک
In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کامل